This page last changed on Mar 03, 2006 by dblasby.
This may look like a long and daunting process, but 90% of it is
QA/QC to make sure that things didnt actually screw up! You
should basically just copy-and-paste the SQL commands from here
and have everything work!
I'm not sure if this will always work, but it appears to be working for me.
NOTE: this is untested on the entire dataset – use at own risk!
TO FIX POLYGONS
---------------
0. everything is in a GEOS enabled postgis
1. assume you've loaded TIGER (using ogr2ogr)
2. assume you've loaded the TIGER polygons (using tigerpoly.py) table "poly"
3. assume you've built indexes on poly and pip:
CREATE INDEX poly_spatial_inx on poly using gist (the_geom
gist_geometry_ops);
CREATE INDEX pip_spatial_inx on pip using gist (wkb_geometry
gist_geometry_ops);
CREATE INDEX poly_mod_poly on poly (module,polyid);
CREATE INDEX pip_fid_idx on poly (ogc_fid);
vacuum analyse poly;
vacuum analyse pip;
4. lets grab the "bad polygons":
create table bad_poly as select * from poly
WHERE not(isvalid(the_geom))
OR numgeometries(the_geom) >1
;
There are two types of "bad_polygons" (see attached picture):
a) multipolygons
b) just invalid polygons
The method of fixing them is similiar.
-
-
- AT THIS STAGE YOU MIGHT WANT TO CHECK ALL THE POLYGONS (I AM ESTIMATING
ABOUT 3000 FOR THE ENTIRE TIGER DATASET). THEY SHOULD FALL INTO TWO
CATEGORIES:
A) MULTIPOLYGON OVER HOLE (SEE PICTURE)
B) EXTERIOR RING TOUCHING ITSELF (IE. SHOULD BE HOLE)
IF THERE IS ANOTHER PROBLEM PLEASE ENSURE THAT THE FIXING WILL ACTUALLY
FIX IT.
5. We're going to deal with the multipolygons first. We are going to
basically convert them to valid polygons which maynot have all the
holes in them. Later we'll stick the holes back in!
a) first we make sure that each of the sub-polygons are valid.
If they are not, you've got an invalid-polygon inside an invalid
polygon and it faster to just fix by hand. I dont expect this to
happen, but lets just make sure first!
create table bad_poly_multi as select * from bad_poly
WHERE numgeometries(the_geom) >1 ;
– I'm only handling the multipolygon with 1 or 2 extra things case.
– if this returns any results, then you'll have to extend things
– to handle it (unlikely)
SELECT * from bad_poly_multi
WHERE numgeometries(the_geom) >3 ;
— there's a problem if this has any results!
SELECT * from bad_poly_multi
WHERE not(isvalid(geometryN(the_geom,2)));
— there's a problem if this has any results!
SELECT * from bad_poly_multi
WHERE not(isvalid(geometryN(the_geom,3))) and
(numgeometries(the_geom) >2);
b) okay, next we just check to make sure that the "extra" polygons are
INSIDE the 1st polygon.
— there's a problem if this has any results!
SELECT * from bad_poly_multi
WHERE
not(within(
geometryN(the_geom,2),
geometryN(the_geom,1) ) );
— there's a problem if this has any results!
SELECT * from bad_poly_multi
WHERE (numgeometries(the_geom) >2) and
not(within(
geometryN(the_geom,3),
geometryN(the_geom,1) ) );
c) okay - this means we can forget about the extra multi-parts of
the geometries!
UPDATE bad_poly_multi set the_geom =
setsrid(geometryN(the_geom,1),32767);
NOTE: these are probably NOT correct polygons!
d) we have to make sure they are correct - they probably have
missing holes (ie. where the sub-geometries are).
So, we're going to find all the polygons that are supposed to be
INSIDE this one and make sure they are.
First step is to find any missing holes in the polygon. We do this
by looking for PIP (point-in-polygon) points inside the polygons.
There should be exactly one; if there is >1 then we need to create
a hole in the polygon.
We're going to handle each hole one-at-a-time;
i) add a required hole column to the table
ALTER TABLE bad_poly_multi ADD COLUMN needed_hole geometry;
ALTER TABLE bad_poly_multi ADD COLUMN needed_hole_pip_fid int;
ALTER TABLE bad_poly_multi ADD COLUMN needed_hole_module text;
ALTER TABLE bad_poly_multi ADD COLUMN needed_hole_polyid bigint;
ii) make sure its null:
UPDATE bad_poly_multi SET needed_hole = NULL;
UPDATE bad_poly_multi SET needed_hole_module = NULL;
UPDATE bad_poly_multi SET needed_hole_polyid = NULL;
UPDATE bad_poly_multi SET needed_hole_pip_fid = NULL;
iii) find a hole to stick in it.
— first, find a (could be >1) pip point
UPDATE bad_poly_multi SET needed_hole_pip_fid =
(SELECT ogc_fid FROM pip WHERE
NOT( (pip.module = bad_poly_multi.module) AND
(pip.polyid = bad_poly_multi.polyid) )
AND wkb_geometry && the_geom
AND contains(the_geom,wkb_geometry)
LIMIT 1);
— next, find the module,polyid for that pip
UPDATE bad_poly_multi SET needed_hole_polyid =
( SELECT polyid FROM pip WHERE
ogc_fid = needed_hole_pip_fid
);
UPDATE bad_poly_multi SET needed_hole_module =
( SELECT module FROM pip WHERE
ogc_fid = needed_hole_pip_fid
);
– next, find the actually polygon
UPDATE bad_poly_multi SET needed_hole =
( SELECT the_geom FROM poly WHERE
poly.module = needed_hole_module
AND poly.polyid = needed_hole_polyid
);
--make sure the SRID is correct (ug)
UPDATE bad_poly_multi SET needed_hole = setsrid(needed_hole,32767);
iv) make sure that the geometry isnt invalid or weird!
--should return nothing
SELECT * FROM bad_poly_multi WHERE
not(isvalid(needed_hole))
OR numgeometries(needed_hole) >1 ;
v) okay, now we can put the hole in the main geometry
UPDATE bad_poly_multi SET the_geom =
difference(the_geom,needed_hole)
WHERE not(needed_hole isNULL);
vi) you might be done, you might have to repeat these steps
from (ii).
– if this isnt 0, then go back to step (ii)
select count from bad_poly_multi WHERE not(needed_hole isNULL);
e) validate
– should return none
SELECT count FROM bad_poly_multi
WHERE not(isvalid(the_geom))
OR numgeometries(the_geom) >1 ;
f) CONGRATS - YOUR MULTIPOLGONS ARE FIXED – WE'LL STICK BACK IN THE
POLY TABLE LATER
6. Now we have to deal with the "invalid" polygons. The process is very
similiar to the multipolygon case.
ISSUE: JTS AND GEOS DO NOT AGREE ON BUFFER(0)!!!
a)
CREATE table bad_poly_invalid as select * from bad_poly
WHERE numgeometries(the_geom) =1 ;
b) we want the outside boundary of the polygon:
UPDATE bad_poly_invalid set the_Geom = buffer(the_geom,0);
NOTE: this could screw up; GEOS buffer(0) isnt 100%
c) now we need to put the holes back into this polygon:
ALTER TABLE bad_poly_invalid ADD COLUMN needed_holes geometry;
d)
UPDATE bad_poly_invalid SET needed_holes = NULL;
e) we find the holes (some of these will actually be touching the polygon)
UPDATE bad_poly_invalid SET needed_holes =
(SELECT buffer(collect(geometryn(the_geom,1)),0) FROM poly WHERE
NOT( (poly.module = bad_poly_invalid.module) AND
(poly.polyid = bad_poly_invalid.polyid) )
AND bad_poly_invalid.the_geom && poly.the_geom
AND distance(bad_poly_invalid.the_geom,poly.the_geom)
<0.0000000001
);
f) make new polygons (could throw a topology exception here, but it shouldnt)
ALTER TABLE bad_poly_invalid ADD COLUMN new_poly geometry;
update bad_poly_invalid set new_poly =
setSRID(difference(the_Geom, needed_holes),32767);
g) check to make sure they're okay:
– should return none
select count from bad_poly_invalid where not(isvalid(new_poly));
– should return none
select count from bad_poly_invalid where numgeometries(new_poly)>1;
7. Final quick QA/QC
a) lets make sure that all our new polygons are okay:
i) everything valid:
--should return none:
select count from bad_poly_invalid WHERE
not(isvalid(new_poly));
select count from bad_poly_multi WHERE not(isvalid(the_geom));
ii) contains exactly one pip
ALTER TABLE bad_poly_invalid add column check_pip int;
UPDATE bad_poly_invalid SET check_pip =
(SELECT count from pip WHERE pip.wkb_geometry
&& new_poly
AND contains(new_poly,wkb_geometry)
);
--should be 0
SELECT count from bad_poly_invalid where check_pip != 1;
ALTER TABLE bad_poly_multi add column check_pip int;
UPDATE bad_poly_multi SET check_pip =
(SELECT count from pip WHERE pip.wkb_geometry && the_geom
AND contains(the_geom,wkb_geometry)
);
--should be 0
SELECT count from bad_poly_multi where check_pip != 1;
b) make a table of all the polygons nearby the ones that are "bad".
– make a table of a bunch of polygons "near" our bad ones
CREATE TABLE poly_check AS
SELECT poly.the_geom,poly.module,poly.polyid from
poly,bad_poly WHERE
poly.the_geom && bad_poly.the_geom;
– dont check the bad ones (we already have)
delete from poly_check where
module||polyid in (select module||polyid from bad_poly);
ALTER TABLE poly_check add column check_pip int;
UPDATE poly_check SET check_pip =
(SELECT count from pip WHERE pip.wkb_geometry &&
setsrid(the_geom,32767)
AND contains(the_geom,wkb_geometry)
);
--should be 0
SELECT count from poly_check where check_pip != 1;
8. re-integrate the corrected geometries back into the poly table.
The SQL for this is a bit strange to do in one stage, so it easier to create
a bunch of SQL statements and execute them.
– really pay attention to this
\t
\o run_me.sql
SELECT 'UPDATE poly SET the_geom = (SELECT
setsrid(multi(the_geom),1) FROM bad_poly_multi WHERE module =
'''||module || ''' and polyid ='||polyid ||') WHERE module
='''||module || ''' AND polyid = '||polyid ||';'
FROM bad_poly_multi
UNION
SELECT 'UPDATE poly SET the_geom = (SELECT
setsrid(multi(new_poly),1) FROM bad_poly_invalid WHERE module =
'''||module || ''' and polyid ='||polyid ||') WHERE module
='''||module || ''' AND polyid = '||polyid ||';'
FROM bad_poly_invalid;
\q
(restart psql)
\i run_me.sql
9. FINAL QA/QC
a) check for bad polys
– should be all fixed
select module,polyid from poly where not(isvalid(the_geom)) or
numgeometries(the_geom) != 1 or the_geom isnull;
b) we've already checked the corrected polygons and the polygons beside them.
We're now going to check the PIP points around the fixed polygons.
CREATE TABLE poly_check2 AS
SELECT poly.the_geom,poly.module,poly.polyid from poly,bad_poly WHERE
poly.the_geom && bad_poly.the_geom;
CREATE TABLE pip_check AS
SELECT pip.wkb_geometry,pip.module,pip.polyid from
pip,poly_check2 WHERE
pip.wkb_geometry && setsrid(poly_check2.the_geom,32767) ;
– we've now got a bunch of pip that are "near" the bad polys.
We're going
– to ensure that they are in exactly one polygon!
ALTER TABLE pip_check add column npoly int;
update pip_check set npoly = (select count from poly where
poly.the_geom && setsrid(pip_check.wkb_geometry,1)
AND
contains(poly.the_geom,setsrid(pip_check.wkb_geometry,1) )
);
select count from pip_check where npoly != 1 or npoly isnull;
10. you're done! Congratulations!
|